首页> 外文OA文献 >The Ontological Key: Automatically Understanding and Integrating Forms to Access the Deep Web
【2h】

The Ontological Key: Automatically Understanding and Integrating Forms to Access the Deep Web

机译:本体关键:自动理解和集成表单   访问Deep Web

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Forms are our gates to the web. They enable us to access the deep content ofweb sites. Automatic form understanding provides applications, ranging fromcrawlers over meta-search engines to service integrators, with a key to thiscontent. Yet, it has received little attention other than as component inspecific applications such as crawlers or meta-search engines. No comprehensiveapproach to form understanding exists, let alone one that produces rich modelsfor semantic services or integration with linked open data. In this paper, we present OPAL, the first comprehensive approach to formunderstanding and integration. We identify form labeling and forminterpretation as the two main tasks involved in form understanding. On bothproblems OPAL pushes the state of the art: For form labeling, it combinesfeatures from the text, structure, and visual rendering of a web page. Inextensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modernweb forms OPAL outperforms previous approaches for form labeling by asignificant margin. For form interpretation, OPAL uses a schema (or ontology)of forms in a given domain. Thanks to this domain schema, it is able to producenearly perfect (more than 97 percent accuracy in the evaluation domains) forminterpretations. Yet, the effort to produce a domain schema is very low, as weprovide a Datalog-based template language that eases the specification of suchschemata and a methodology for deriving a domain schema largely automaticallyfrom an existing domain ontology. We demonstrate the value of the forminterpretations in OPAL through a light-weight form integration system thatsuccessfully translates and distributes master queries to hundreds of formswith no error, yet is implemented with only a handful translation rules.
机译:表单是我们进入网络的大门。它们使我们能够访问网站的深层内容。自动表单理解提供了从此内容的关键到从元搜索引擎中的爬网程序到服务集成商的各种应用程序。但是,除了作为特定于组件的应用程序(例如搜寻器或元搜索引擎)以外,它几乎没有受到关注。没有形成形式理解的全面方法,更不用说为语义服务或与链接的开放数据集成而生成丰富模型的模型了。在本文中,我们介绍了OPAL,这是对形式理解和集成的第一种综合方法。我们将表单标签和表单解释确定为表单理解中涉及的两个主要任务。在这两个问题上,OPAL都提出了最先进的要求:对于表单标签,它结合了文本,网页结构和视觉呈现的功能。在ICQ和TEL-8基准上进行的无用实验以及200种现代Web表格OPAL的性能大大优于以前的表格标记方法。对于表单解释,OPAL使用给定域中的表单架构(或本体)。由于采用了这种领域模式,它能够产生几乎完美的形式(在评估领域中,准确率超过97%)。然而,由于我们提供了一种基于Datalog的模板语言,该语言简化了此类方案的规范,并且大大简化了从现有领域本体中自动推导域方案的方法,因此产生域方案的工作量非常低。我们通过轻量级表单集成系统演示了OPAL中的表单解释的价值,该系统成功地将主查询转换并分发到数百个表单中,而没有错误,但仅使用少量翻译规则即可实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号